System and Methods of Language Processing

ABSTRACT

The disclosure related to the field of language processing. A server (30) is configured to respond to a query associated with a user device (21) by sending, to the user device (21), an indication of an item selected based on semantic importance attributed to grams of text in the query. Attributing semantic importance comprises: in the event that a number of occurrences of the gram in a first document is above an occurrence threshold, determining a gram score for said gram based on said number of occurrences; in the vent that the number of occurrences of the gram in the first document is below the occurrence threshold, determining the gram score based on: (i) said number of occurrences, and (ii) a reference score for the gram based on a number of occurrences of the gram in a reference document different to the at least one first document; and attributing the semantic importance based on the gram score.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.16/112,227, filed Aug. 24, 2018, which claims the benefit of priority toUnited Kingdom Patent Application No. 1713728.2, filed Aug. 25, 2017,and European Patent Office Application No. 18156364 filed Feb. 12, 2018,each of which are hereby incorporated by reference in their entirety.

FIELD OF THE INVENTION

The present invention relates to the field of language processing.

BACKGROUND

Computers are typically inadequate with regards to processing naturallanguage, e.g. as spoken or written by humans. This is because input inthe form of natural language is not provided in any specific format thatthe computer is programmed to receive; humans do not typicallycommunicate (e.g. speak or write) in the logical and ordered mannerwhich computers are used to, and so computers may be unable to deal withsuch input. Computers may be designed to use language processing modelsso as to improve their processing of text input. Language processingmodels are designed to infer meaning from a string of input text.

Language processing models may be provided which operate using aplurality of pre-set rules which are coded into the computer. Theserules may be coded, by a user, based on the user's experience with alanguage. For example, such rules may be based on their understanding offormal requirements for correct syntax and semantics of a language.However, to provide for a more robust form of language processing, suchrules may also include reference to informalities or regional variationsin language, such as slang. Consequently, to create a satisfactorylanguage processing model, capable of dealing with a variety of textinput, an inordinate number of such rules would be needed. To addressthis issue, it has been suggested that statistical methods may be used,in combination with a machine learning element, to identify and learnrules based on the language processor's experience with text input.

Such language processing models may provide limited insight into thecontribution that any one individual word makes to the meaning, oressence, of a phrase. Such models are unable to derive meaning fromnatural language or to attribute semantic meaning to language. Inparticular, such models are unable to discern the impact that contexthas on use of natural language.

SUMMARY OF THE INVENTION

Aspects and examples of the present disclosure are set out in the claimsand aim to address at least these and other technical problems.

In an aspect there is provided a computer-implemented languageprocessing method for attributing scores to grams of text. The methodcomprises: (i) obtaining a gram of text relating to a first corpus oftext; (ii) determining a score for the gram based on a term frequency ofthe gram in the first corpus; and (iii) in the event that the termfrequency is below a threshold value, determining the score based on aterm frequency for the gram in a second unrelated corpus of text.

The scores may provide an indication of semantic importance for eachgram (e.g. word) of text in a string of words (e.g. a sentence ordescription). In particular, the scores may enable a computer toidentify the importance one word has in a sentence. For instance, theimportance of the word may be determined in relation to both a specificcontext associated with the sentence and a generic context different tothe specific context. This may enable the computer to identify a wordwhich appears, when viewed in the specific context, to be very importantin the sentence. However, the computer may also determine that when saidword is viewed in a generic (e.g. not the specific) context, it is notvery important. Based on this finding, the computer may determine thatsaid word is not very important in the sentence. Therefore, a score maybe determined for the word which attributes less importance to it thanif the score had been determined based on only the specific context.

The scores may be used to determine similarity between item identifierscomprised of such grams of text. They may be used to infercharacteristics for such item identifiers. The first corpus of text maybe in the form of at least one first document associated with the firstcontext. The term frequency may provide an indication of the number ofoccurrences. The same method of determining a score may be applied forboth the first corpus and the second unrelated corpus of text. Thesecond unrelated corpus of text could be arbitrary. It could be selectedto be a generic overview of text for a language, e.g. the Englishlanguage. It may be selected to comprise normal, non-specific uses ofwords. It relates to a different context than the first corpus. Thecontext may be specific to a second context or it may be acontextual.Each corpus may have an associated indication for grams of text whichoccurred in that corpus and the number of times they occurred.

In an embodiment, the method comprises comparing two item identifiers,each comprising at least one gram of text, based on the determinedscores for component grams of the item identifiers. The two itemidentifiers may be associated with the first corpus of text, or thecontext thereof.

In an aspect a server may be provided for implementing any of the abovemethods. In particular, the server may be configured to respond to aquery by selecting an item identifier (e.g. a sentence describing anitem) which is similar to an indication of an item identifier includedin the query. The selected item identifier may be identified bycomparing item identifiers with the item identifier in the query. Thescores may be used for this comparison so that the comparison takes agreater account of the more important words in the sentence than theless important words. The determined importance for the words in thesentence may be based on both the importance of those words when viewedin their specific context, and the importance of those words when viewedout of the specific context. This may enable words not to be attributedover-importance on the basis of them being less frequently used in thespecific context. Thus, a comparison of two sentences may be based on amore general interpretation of the language, and so may be impacted lessby (or may be less vulnerable to) the impact of specific context onlanguage processing.

For example, in one aspect there is provided a server comprising a datastore storing: (i) at least one first document, wherein the at least onefirst document comprises a plurality of item identifiers, wherein eachitem identifier comprises at least one gram of text; and (ii) anassociation between each said gram and a corresponding gram score. Theserver also comprises a processor coupled to the data store. The serveris configured to obtain a new item identifier comprising at least onegram of text, and for each gram in the new item identifier the processoris configured to determine the corresponding gram score for said gram.Determining a gram score comprises:

-   -   In the event that the number of occurrences of the gram of text        in the at least one first document is above an occurrence        threshold, determining the gram score based on said number of        occurrences.    -   In the event that the number of occurrences of the gram of text        in the at least one first document is below the occurrence        threshold, determining the gram score based on at least one        of: (i) said number of occurrences and (ii) a reference score        for said gram. The reference score is based on the number of        occurrences of said gram in at least one reference document        which is different to the at least one first document.

The processor is configured to assign semantic meaning to each gram inthe new item identifier based on a respective determined gram score. Theprocessor may be configured to update the association in the data storebased on the gram score determined in this way. Assigning semanticmeaning may comprise attributing semantic value to each of the grams inan item identifier (e.g. a contribution from each gram to the overallmeaning of that item identifier). The attributed semantic value (i.e.the contribution a gram makes to the overall meaning of an itemidentifier) is determined based on the gram score for that gram; it maybe the gram score for that gram.

In some embodiments, the server is configured to obtain an indication ofa query item identifier, e.g. comprised in a query, comprising at leastone gram of text, and to determine, based on the updated data store, atleast one query gram score for the query item identifier. For example,this may comprise determining a query gram score for each gram in thequery item identifier based on the number of occurrences of that querygram in the at least one first document. These query gram scores may beused for comparison between the query gram item identifier and an itemidentifier in the at least one first document so that an item identifierfrom the at least one first document may be selected to be used whenresponding to the query.

The processor is configured to select an item identifier from at leastone first document based on the at least one query gram score and theassociation. For example, the processor may use the at least one querygram score when determining the degree of similarity between the querygram item identifier and an item identifier in the at least onedocument. The processor may select the item identifier with the highestdegree of similarity. The processor may respond to the query by sendinga message, to a user device associated with the query. The message maybe configured to provide an output at the user device based on theselected item identifier.

As an example, a query may be received which comprises the query itemidentifier “Chicken Tikka”, e.g. the query comprises a request forchicken tikka. The processor may determine a score for each of thesegrams based on their number of occurrences in the data store. Theprocessor may also determine a score for grams in the data store. Basedon the determined scores for component grams, the processor may comparethe item identifier “Chicken Tikka” with a plurality of different itemidentifiers in the data store to find any similar item identifiers. Theprocessor may determine that “tasty chicken tikka” is similar because“tasty” is a common word in generic English and so should not beconsidered an important word in that item identifier, thus a comparisonof the two item identifiers may determine them to be similar. The servermay send a message to the UE to provide an output indicative of the item“tasty chicken tikka”. For example, this may comprise sending an itemassociated with the item identifier to a location associated with theUE. For instance, in response to a user of the UE accepting or selectingthe item, e.g. based on the received indication of the item identifierin response to their query.

This may enable a score (query gram or gram) to be determined for eachgram in an item identifier based on the contents of the data store. Inparticular, each one of such scores will be indicative of the number ofoccurrences of the corresponding gram in the at least one firstdocument. Furthermore, the server is able to determine a suitable scorefor grams which do not occur frequently in the at least one firstdocument. These scores may be used when comparing, and selectingsuitable, item identifiers. The selected item identifier may be outputto a resource.

It is to be appreciated in the context of this disclosure that referenceto the use of the ‘number of occurrences of a gram’ may include use of ametric derived from the number of occurrences. For example, where aninverse document frequency value is used, which is calculated based on anumber of occurrences for a gram, the processor may determine whether ornot the inverse document frequency value is less than a threshold ornot, and determine how to determine the gram score based on a comparisonbetween the inverse document frequency value and a threshold inversedocument frequency value.

Where the at least one first document relates to a certain context ofitem identifiers, the server may be able to identify a gram in a newitem identifier which infrequently occurs in the at least one document.

The number of occurrences of this gram in the at least one reference maythen be determined. The at least one reference document may beacontextual or based on a different context to the at least one firstdocument. In which case, the server may be able to determine that a gramwhich does not occur frequently in the first document is still a commongram, but it is not associated with the at least one first document (orthe context thereof). This then enables the server to determine a scorefor that gram which encompasses the frequency of that gram in areference document of a different context. Accordingly, the scoredetermined for that gram may be more representative of the overallfrequency of occurrence of that gram in the English (or other) language.Therefore, any comparison or selection of item identifiers based ontheir gram scores may be less influenced by a low frequency ofoccurrence for a gram in the (contextual) at least one document, in theevent that said gram occurs more frequently in the reference document.The determined scores (and thus comparisons/selections based thereon)may therefore be able to better cope with the inclusion of acontextualgrams in item identifiers, or at least grams from different contexts.

A context may comprise a collection of words, i.e. a subset ofvocabulary of a language, which occur in relation to items of thatcontext. For instance, words (or grams of text) associated with acontext refer to words which surround a focal point for that context,e.g. they are typically used to describe members of that context. Acontextual corpus of text may be defined based on every word used in aset of documents relating to that context. In one example, the contextmay be restaurants, and the corpus of text comprises each word used onmenus for restaurants. In this example, the occurrence of food-specificwords may be disproportionately higher than usual, and the occurrence ofcertain common words which are not applicable to food may be lower thanusual. The reference document may provide an acontextual corpus of text.For instance, one which is indicative of the language as a whole suchthat the frequency with which words occur is representative of alanguage as a whole, and so e.g. food-specific words will occur lessfrequently than they would in a food-specific corpus of text. Thereference document may therefore provide a normalisation for the numberof occurrences of a gram of text in a language as a whole.

The at least one first document may be indicative of a corpus of text.The corpus of text may be associated with a first context. The at leastone first document may comprise a plurality of first documents, and thecorpus of text is representative of all of the documents. The at leastone first document may represent a corpus of text associated with aplurality of other documents. The first document may be in the form of alook up table, wherein each tuple provides an indication of the gram andthe number of occurrences. Alternatively, the number of occurrences maybe determined on-the-fly, e.g. using a word counting system.

Each item identifier may be indicative of an item, e.g. a physical item.Each document may be indicative of a facility and its item identifiersindicative of items associated with, or available at, that facility.Each gram may be a portion of text comprised within the item identifier.The item identifier may be a string of words describing its item.Selecting an item identifier may enable the server to send an indicationof an item to the user, wherein the indication is of an itemcorresponding to the item identifier and a selected first document (orfacility) associated with that item. Each gram may comprise alphanumericcode, for instance it may comprise letters or letters and numbers, ornumbers. The association may be between n-grams and corresponding gramscores. For instance, it may comprise an indication of item identifiersand corresponding scores.

Each gram score may provide an indication of the number of occurrencesof the corresponding gram in the at least one first document, e.g. in acorpus defined by the at least one first document. The gram score isdetermined based on the number of occurrences; it may be the number ofoccurrences. The data store may store an association between each gramand a number of occurrences of that gram. Obtaining the new itemidentifier may comprise receiving it from a user device. For instance,the user device may be associated with a facility, it may be associatedwith a user of a facility. The new item identifier may be received in anew item identifier message. The new item identifier may comprise anindication of an item identifier. For instance, the new item identifiermay be associated with one of the at least one first documents. In suchcases, the server may be configured to add the new item identifier toits corresponding first document.

The occurrence threshold may represent a selected criterion, and thegram score being above the occurrence threshold represents the selectedcriterion being satisfied. For example, gram scores may be in the formof a numeric value, and the occurrence threshold may be in the form of anumeric value, above which the criterion is satisfied. This value may beselected by a user. For example, this selection may be based on acertain degree of significance or probability.

Determining the gram score based on the number of occurrences maycomprises, in the event that the gram matches a stored gram (e.g. abovea selected degree of similarity), determining the gram score based onthe gram score associated with the stored gram. For instance, it may bedetermined as that gram score. In the event that a stored gram isassociated with a number of occurrences of that gram, and the storedgram is determined to be similar to the new gram, the gram score for thenew gram may be determined based on the stored number of occurrences forthat gram. Determining the gram score may comprise determining aninverse document frequency count for that gram. This comprisesdetermining a score which decreases in proportion with any increases tothe number of occurrences in the at least one first document.Determining the reference gram score comprises performing the samemethod steps, but based on the at least one reference document ratherthan the at least one first document.

The at least one reference document is configured to be associated witha different context to the at least one first document. For instance, itmay be an acontextual document, which could comprise an arbitrary, orselected, corpus of text. The reference document being different to thefirst document may comprise the reference document being of a differentcontext to the first document; it may comprise the contents (i.e. theitem identifiers) being different (e.g. not identical).

Updating the association may comprise, adding an indication of the newgram and/or its determined gram score to the association. In the eventthat the new item identifier is associated with a first document, it maycomprise adding that item identifier to the first document. It may alsocomprise adding each of the grams to the association. In the event thatthe new item identifier comprises a gram which does not correspond toany item gram in the data store, it may comprise adding an indication ofthe new gram and its corresponding determined gram score to theassociation. Thus, based on the at least one gram may comprise addingthe new gram. Updating the association may also comprise updating thecorresponding gram score for a gram. For instance, where the gram scoreis indicative of the number of occurrences of that gram, updating thegram score may comprise updating based on the inclusion of anotheroccurrence of that gram.

Determining based on the updated data store may comprise determiningbased on the contents of the data store, which includes the newly addedgram, and its corresponding determined gram score. For instance,determining the query gram score may comprise a look-up in the datastore, and in the event that there is a gram in the associationcorresponding to the query gram, obtaining the query gram score based onthe gram score associated with that gram. The indication of the queryitem identifier may be comprised in a query, or for example, a query mayhave been received at another component not part of the server, and anindication of the query and/or the contents thereof was communicated tothe server so that relevant query gram scores may be determined. Thequery item identifier may comprise a plurality of grams of text. Thesegrams may be related to the context of the at least one first document,although some may not. The processor is configured to determine a querygram score for each of the grams in the query item identifier.

The selection may be based on the query gram scores stored in theassociation. It may include a comparison based at least in part on querygram scores with gram scores. Where the at least one first documentcomprises a plurality of documents, selecting may comprise selecting oneof the first documents, based on that first document comprising an itemidentifier comprising grams which correspond to the query itemidentifier. For instance, the server may be configured to select severalitem identifiers, or several first documents, and send them all to theuser device, wherein a user at the user device may be configured toselect one of them. In an embodiment, selecting the item identifier isbased on a degree of similarity between the query item identifier andeach of a plurality of item identifiers in the at least one firstdocument. This may enable item identifiers to be selected which arechosen based on their associated gram scores, which have been determinedin the above manner and as such provide for an improved comparisonand/or selection process.

The selection may be determined based on each of a plurality ofcomparisons, wherein each gram in the query item identifier is comparedwith each gram in each of the plurality of item identifiers in the atleast one first document. For instance, this may comprise selecting anitem identifier which has the highest degree of similarity with thequery item identifier. This similarity may be a combined similaritybased on each of its component grams, or it may be an overall similaritybetween the two item identifiers. The processor is configured todetermine the degree of similarity. The comparison between the queryitem identifier and item identifiers in the association may comprisecomparing the query item identifier (its component grams) with each ofthe item identifiers (and their component grams) in the first document.

In an embodiment, determining at least one query gram score comprisesdetermining, for each gram in the query item identifier, a correspondingquery gram score for said gram. Determining the at least one query gramscore may be thought of as determining a query score, wherein the queryscore is based on (e.g. comprises) each individual query gram score forthe query grams in the query item identifier. The query gram score isdetermined in the same manner as for the gram score. In an embodiment,the processor is configured to determine the degree of similaritybetween an item identifier in a first document and the query itemidentifier based on the corresponding gram score for each of the atleast one grams in the item identifier and the corresponding query gramscore each of the at least one grams in the query item identifier. Thedegree of similarity is determined using each of said grams and saidquery grams.

In an embodiment, the server is configured to select an item identifierfrom one of the first documents in the event that the degree ofsimilarity is above a similarity threshold. The similarity thresholdcomprises a threshold criterion such that in the event that thecriterion is satisfied, the item identifier may be selected, and in theevent that it is not, the item identifier is not selected. The degree ofsimilarity may be a numerical value. The threshold may be a numericalvalue, above which the item identifier is selected. The selection may bedetermined based on identifying the item identifier with the highestvalue for the degree of similarity. In the event that there is not adegree of similarity above a similarity threshold, the server may beconfigured to return to the user a selection of at least one itemidentifier, wherein the at least one item identifier is selected to havea degree of similarity with the query item identifier which is as highas possible out of the possible item identifiers to be selected.

In an embodiment, the processor is configured to determine the degree ofsimilarity based on a weighted combination of the grams in the queryitem identifier. The weighted combination comprises the processorscaling each query gram score so that its contribution to a total gramscore is controlled based on a selected degree of scaling. For instance,the contribution of a gram may be reduced in the event that the gram isassociated with a lower gram score.

In an embodiment, each of the at least one grams of the query itemidentifier is weighted based on its corresponding query gram score. Thismay enable the scores determined by the processor to be used to balancethe weightings applied when determining the degree of similarity andselecting item identifiers. The weighting may comprise scaling thecontribution of a query gram based directly on its gram score. Forinstance, the query gram score may be applied as a numericalcoefficient. The ratio of contribution from each query gram to the totalscore is representative of the query gram scores. It may beproportionate to; it may be equal to. The contribution relates to thesemantic importance attributed to a gram. In an embodiment, theprocessor is configured to determine the degree of similarity betweenthe query item identifier and an item identifier in a first documentbased on a weighted combination of the at least one gram in said itemidentifier. In an embodiment, each of the at least one grams of the itemidentifier is weighted based on its corresponding gram score.

In an embodiment, the processor is configured so that a value for thedegree of similarity at the similarity threshold is less than a valuefor the degree of similarity between two identical item identifiers. Thevalue may be a numerical value, e.g. a continuous or a discrete value.The similarity threshold may also be represented by a value. Twoidentical item identifiers comprise a first and second item identifier,wherein each item identifier consists of the same number of grams andeach gram has an exact counterpart in the other item identifier. Thedegree of similarity at the similarity threshold is less than this valuesuch that two-non identical item identifiers may be determined tosatisfy the similarity threshold. For instance, a first item identifiermay comprise the exact second item identifier, as well as at least oneother gram. This may be determined to satisfy the similarity threshold.

This may enable the server to determine that two item identifiers have asufficiently high degree of similarity even though the two itemidentifiers are not identical. In particular, when the comparison isdetermined in combination with the weighting system described above,this may enable the system to utilise the determined weightings toassess the degree of similarity, so that, for example highly weightedgrams of text contribute more to the overall determination of the degreeof similarity. Thus, two item identifiers which appear to differsignificantly, e.g. having only one shared gram of text, may bedetermined to be similar above the similarity threshold because thecontribution of the shared gram carried with it a high weighting.

In an embodiment, a plurality of first documents are stored in the datastore, each comprising a plurality of item identifiers. The indicationof a query item identifier comprises an indication of a plurality ofquery item identifiers each comprising at least one gram of text. Theplurality of first documents may form a corpus of text, each being aportion of the corpus, wherein the corpus is defined based on theinclusion of each item identifier in each of a plurality of the firstdocuments. The data store may comprise an indication of each differentgram of text found in each of the first documents. This may be storedassociated with the number of occurrences for that gram. One of theplurality of first documents may comprise a summary document comprisingan association between each of the grams of text found in the pluralityof other first documents associated with the number of occurrencescorresponding to each gram. The indication of the plurality of queryitem identifiers may comprise an indication of a quantity associatedwith each query item identifier. Each query item identifier may beseparate. The indication of the plurality of query item identifiers maybe received in a query message which comprises an association between aplurality of query item identifiers and a corresponding indication ofquantity.

In an embodiment, the processor is configured to:

-   -   Determine query gram scores for each of the at least one grams        in each of the query item identifiers.    -   Determine a degree of similarity between each of the query item        identifiers and each of a plurality of item identifiers in the        first documents.    -   Select at least one first document out of the plurality of first        documents based on determined degrees of similarity for all of        the query item identifiers.

Selecting the at least one first document out of the plurality of firstdocuments based on determined degrees of similarity may compriseaccounting for each individual degree of similarity. For instance, thefirst document may be selected based on it being the document with ahighest overall degree of similarity, wherein the overall degree ofsimilarity is determined based on a combination of each of theindividual degrees of similarity. The contribution of each individualdegree of similarity may be scaled based on an indication of quantityassociated with a query item identifier. The selection may be thedocument comprising the greatest number of item identifiers which have adegree of similarity above the similarity threshold when compared withthe query item identifiers. A combination of approaches may be used.

In an embodiment, the indication of a query item identifier isassociated with a UE. This may be determined based on the message beingreceived from the UE. It may be determined based on an indication ofuser device being obtained with the query item identifiers. In anembodiment, the server is configured to send, to the UE, an indicationof the at least one selected first documents. The indication of the atleast one selected first document may be sent to the UE in a selectionmessage. This may include sending the first document associated withthat facility; it may include sending an indication of the facilityassociated with that first document; it may include sending anindication of the selected items from that first document; or anycombination of the above. In an embodiment, each of a plurality of theplurality of first documents is associated with a correspondingfacility. For instance, item identifiers may be representative of itemsassociated with that facility. Each item identifier may provide anindication of a respective item.

In an embodiment, in the event that a first document is selected, theserver is configured to send an indication of the plurality of selecteditem identifiers to the facility corresponding to said first document.It may comprise sending an indication of the query item identifiers aswell (e.g. in cases where there was not a direct match between them) asthis may help alert the facility to the specifics of the originalrequest that had since been re-directed to them. This may be sent to thefacility in a facility message. The indication of the plurality of queryitem identifiers may comprise an indication of item identifiersassociated with a first document associated with the facility which havea determined degree of similarity above the similarity threshold whencompared with the query item identifiers.

In an embodiment, the processor is configured to send the indication ofthe plurality of query item identifiers to the facility corresponding tosaid first document in the event that the server receives an acceptancemessage from the UE in response to the indication of the at least oneselected first documents sent to the UE. The acceptance message maycomprise an indication that a user associated with the UE accepts itemidentifiers associated with the facility. The selection message maycomprise an indication of several facilities and/or item identifiersassociated with the facilities. In which case, the selection message maycomprise an indication of selection by the user of a facility and/orselection of item identifiers. In an embodiment, the acceptance messagecomprises a selection of one of the at least one selected firstdocuments, and the indication of the plurality of query item identifiersis sent to the facility corresponding to the one first document. In anembodiment, selecting the at least one first document comprisesselecting each first document comprising, for each of the query itemidentifiers, an item identifier having a degree of similarity above thesimilarity threshold. In the event that none of the item identifierssatisfy the similarity threshold, a first document may be selected whichcomprises item identifiers with the highest value for the degree ofsimilarity.

In an embodiment, selecting the at least one first document comprisesselecting one first document, the one first document being determinedbased on a combination of each of the degrees of similarity for thequery item identifiers. This may include selecting the item identifiersassociated with the one first document. Selecting a first document maycomprise selecting a facility associated with said first document to bea facility from which the item identifiers are to be selected. In anembodiment, selecting the one first document comprises selecting thedocument with the highest combination. The combination comprises anaverage of numeric values for each for each of said degrees ofsimilarity. In an embodiment, the processor is configured to determinethe corresponding gram score for a gram of text based on the totalnumber of item identifiers in the data store. In an embodiment, in theevent that the reference score for a gram of text is greater than areference threshold value, the processor is configured to determine alower gram score for said gram. The reference score being greater thanthe reference threshold value provides an indication that the gramoccurs frequently in the reference document. Determining the lower gramscore may comprise reducing a value for the gram score.

In an embodiment, the lower gram score is determined based on thereference score. For example, the reduction in the value for the gramscore may be determined based on the value for the reference score. Thegram score may be scaled based on a value for the size of the referencescore. In an embodiment, determining the gram score for a gram of textcomprises reducing the value of the gram score in the event that anoriginal value for the gram score would be larger than a gram threshold.This comprises setting an upper limit threshold to a value for the gramscore. In the event this upper limit is exceeded by a value determinedfor the gram score, then the value for the gram score may be scaled sothat it is determined to be a lower value. The gram threshold may bethought of as a criterion, scores greater than it thus not satisfyingthe criterion. In an embodiment, the processor is configured todetermine the size of the reduction to the gram score based on thedifference between the original value for the gram score and the gramthreshold. The gram score may be determined such that the differencebetween the originally determined gram score and the gram threshold maybe subtracted from the gram threshold, and that value provides the gramscore.

In an embodiment, the data store comprises a plurality of trigger grams,wherein the processor is configured to detect a trigger gram in an itemidentifier and to alter the gram score of any subsequent grams in theitem identifier. Trigger grams comprise grams for which the processor isconfigured to identify and, in response to said identification, cause aselected action to occur. The stored trigger grams are selected torepresent grams the use of which provides an indication of naturalbreaks or subjunctive clauses in an item identifier. For instance, gramswhich occur after the trigger gram in the item identifier may have theirrespective gram scores determined based on their location in the itemidentifier. For example, the scores are indicative of the nature of thetrigger gram.

In an embodiment, altering the gram score comprises at least one of:reducing the gram score and separating the item identifier into twoportions, a first portion for the grams before the trigger gram and asecond portion for the grams after the trigger gram. Reducing the gramscore comprises reducing the contribution of the subsequent grams to anoverall score for the item identifier. For instance, each portion mayhave a portion score associated therewith which is determined based onthe component gram scores of the portion. The processor may beconfigured to apply a weighting to each of these portion scores. Theweighting applied to the portion after the trigger gram may beconfigured to reduce that portion score. The processor may be configuredto compare two item identifiers based on a comparison between the firstportion scores in each respective item identifier. This may enable lessimportant clauses within an item identifier to be considered less, sothat a comparison between two item identifiers is focused on primaryclauses within the item identifier.

In an embodiment, the processor is configured to determine the degree ofsimilarity based on a cosine similarity. Cosine similarity comprisestransforming a gram of text into a co-ordinate in an n-dimensionalvector space, for example using a word embedding. This may be referredto as a ‘semantic similarity’. An item identifier may then be located ata position indicative of a combination of the determined positions foreach of its component grams. This may be a sum; it may be a weighted sumbased on each respective gram score. The location of two itemidentifiers may then be determined, and the cosine of the angle betweenthem may be determined to provide an indication of similarity betweenthe two item identifiers, based on the direction of their location inthe vector space. An indication of these locations, e.g. co-ordinates,may be stored in the data store associated with the relevant itemidentifier.

In an embodiment, the processor is configured to determine the degree ofsimilarity based on a cosine similarity between the gram weightings fora first item identifier and the gram weightings for a second itemidentifier. In this case, the co-ordinates for each item identifier arespecified by weightings associated with grams in said item identifier,i.e. the respective gram scores. This may be referred to as a ‘textsimilarity’.

In an embodiment, determining the degree of similarity comprises: first,determining a degree of text similarity between a first item identifierand a second item identifier. In the event that the degree of textsimilarity satisfies a text similarity threshold criterion, the degreeof similarity is determined based on this degree of text similarity. Inthe event that the degree of text similarity does not satisfy the textsimilarity threshold criterion, the processor is configured to determinea degree of semantic similarity between the first item identifier andthe second item identifier. In this case, the degree of similarity isdetermined based on the semantic similarity. Satisfying the textsimilarity threshold criterion may include determining that the degreeof text similarity is greater than a selected threshold. This value maythen be taken as the degree of similarity.

In an aspect there is provided a server for processing a query itemidentifier. The server comprises a data store storing: (i) at least onefirst document, wherein the at least one first document comprises aplurality of item identifiers, wherein each item identifier comprises atleast one gram of text, and (ii) an association between each gram and acorresponding gram score. The server also comprises a processor coupledto the data store. The server is configured to obtain an indication of aquery item identifier comprising at least one gram of text. Theprocessor is configured to determine at least one query gram score forthe query item identifier. Determining the query gram score for a gramin the query item identifier comprises:

-   -   In the event that the number of occurrences of the gram of text        in the at least one first document is above an occurrence        threshold, determining the query gram score based on said number        of occurrences.    -   In the event that the number of occurrences of the gram of text        in the at least one first document is below the occurrence        threshold, determining the query gram score based on said number        of occurrences and a reference score for said gram. The        reference score is based on the number of occurrences of said        gram in at least one reference document which is different to        the at least one first document.

The processor is configured to select an item identifier from at leastone first document based on the at least one query gram score and theassociation.

In an aspect there is provided a computer-implemented method ofselecting item identifiers in response to a query. The method comprises:

-   -   Obtaining a new item identifier comprising at least one gram of        text.    -   Determining a gram score for each gram of text in the new item        identifier. Determining the gram score comprises: (i) in the        event that the number of occurrences of the gram of text in the        at least one first document is above an occurrence threshold,        determining the gram score based on said number of occurrences;        and (ii) in the event that the number of occurrences of the gram        of text in the at least one first document is below the        occurrence threshold, determining the gram score based on said        number of occurrences and a reference score for said gram. The        reference score is based on the number of occurrences of said        gram in at least one reference document which is different to        the at least one first document.    -   Updating an association in a data store based on the at least        one gram in the obtained item identifier and its corresponding        determined gram score. The data store stores: (i) at least one        first document, wherein the at least one first document        comprises a plurality of item identifiers, wherein each item        identifier comprises at least one gram of text, and (ii) the        association between each gram and a corresponding gram score.    -   Obtaining an indication of a query item identifier comprising at        least one gram of text, and determining at least one query gram        score for the query item identifier based on the updated data        store.    -   Selecting an item identifier from at least one first document        based on the at least one query gram score and the association.

In an aspect there is provided a computer-implemented method forprocessing a query item identifier. The method comprises:

-   -   Obtaining an indication of a query item identifier comprising at        least one gram of text.    -   Determining at least one query gram score for the query item        identifier. Determining the query gram score comprises: (i) in        the event that the number of occurrences of the gram of text in        the at least one first document is above an occurrence        threshold, determining the query gram score based on said number        of occurrences; and (ii) in the event that the number of        occurrences of the gram of text in the at least one first        document is below the occurrence threshold, determining the        query gram score based on said number of occurrences and a        reference score for said gram. The reference score is based on        the number of occurrences of said gram in at least one reference        document which is different to the at least one first document.    -   Selecting an item identifier from at least one first document        stored in a data store, wherein the data store stores: (i) the        at least one first document, wherein the at least one first        document comprises a plurality of item identifiers, wherein each        item identifier comprises at least one gram of text, and (ii) an        association between each gram and a corresponding gram score.    -   Selecting the item identifier based on the at least one query        gram score and the association.

In embodiments, the number of occurrences for a gram in the at least onefirst document comprises a total number of item identifiers in the atleast one first document which include one or more occurrences of saidgram. This may enable grams which occur more than once in an itemidentifier to have a gram score determined therefor which takes accountof this and does not determine the number of occurrences of that gram tobe higher than is reflective of that gram's usage. For example, for anitem identifier such as ‘piri piri chicken’, a gram score determinedwhich is reflective of the uniqueness of the gram ‘piri’ would be lesscompromised in light of that gram being counted as only occurring in oneitem identifier rather than always being determined to have occurredtwice whenever it does occur. The same approach may be taken fordetermining the number of occurrences of a gram in the at least onereference document.

Alternatively and/or in combination, the number of occurrences may bethe absolute total number of occurrences of that gram in the at leastone first document (e.g. multiple occurrences of the same gram in oneitem identifier will be contribute multiple times to the number ofoccurrences).

In embodiments, the processor may be configured to select the itemidentifier in the event that the degree of similarity is above asimilarity threshold. In embodiments, the processor may be configured todetermine the degree of similarity based on a weighted combination ofthe grams in the query item identifier. In embodiments, attributingsemantic importance to each gram of the query item identifier comprisesweighting that gram based on its corresponding gram score. Inembodiments, the processor may be configured to determine the degree ofsimilarity based on a weighted combination of the grams in the itemidentifier. In embodiments, attributing semantic importance to each gramof the item identifier comprises weighting the gram based on itscorresponding gram score. In embodiments, a degree of similarity at thesimilarity threshold is less than a degree of similarity between anidentical item identifier and query item identifier. In embodiments,wherein the processor is configured to determine the degree ofsimilarity based on semantic importance attributed to: (i) grams in theitem identifier and (ii) grams in the query item identifier.

In embodiments, the processor selecting the at least one first documentmay comprise selecting each first document comprising, for each of thequery item identifiers, a respective item identifier having a degree ofsimilarity above the similarity threshold. In embodiments, selecting theat least one first document may comprise selecting one first documentbased on a combination of the determined degree of similarity betweeneach of the query item identifiers and their respective item identifiersin said one first document. In embodiments, selecting the one firstdocument comprises selecting the document with the highest combination.In embodiments, the processor is configured to determine the gram scorefor a gram of text based on the total number of item identifiers in thedata store. 3433. In embodiments, a method comprises: comparing a queryitem identifier with at least one item identifier based on determinedscores for component grams of the identifiers; and selecting at leastone item identifier to be provided as output to the resource based onsaid comparison.

In an aspect there is provided a computer program product comprisingprogram instructions configured to program a processor to perform any ofthe methods described or claimed herein.

FIGURES

Embodiments will now be described, by way of example only, withreference to the accompanying drawings in which:

FIG. 1 shows a schematic drawing of a network system.

FIG. 2 shows a timing diagram indicating a method of operation using thenetwork system of FIG. 1.

FIG. 3 shows a timing diagram indicating a method of operation using thenetwork system of FIG. 1.

FIG. 4 shows a flow chart illustrating a method of operation of the anexample network system as illustrated in FIG. 1.

SPECIFIC DESCRIPTION

FIG. 1 shows a network system 100 comprising a server 30 for processingdata requests.

The server 30 is arranged to perform a language processing method toprocess queries. The queries each include item identifiers associatedwith facilities, which may be indicative of items available at afacility. The method involves determining scores for grams of text (e.g.words). The scores may be used when comparing items in the query withitems in a data store. Each item may be described by a plurality ofwords. The comparison between items in the data store and items in thequery may be based on each of the words and their respective scores.

The facilities, and the item identifiers associated therewith, aretypically associated with a first context. The context may be a unifyingtheme common to all items and facilities associated with it. Forexample, words associated with “food” may define a context. A data storeof the server 30 stores an indication of known item identifiersassociated with the facilities, e.g. meals offered by differentrestaurants. These items and facilities are considered to be associatedwith the first context. When determining the score for a word, theserver 30 determines the score based on the number of occurrences ofthat word in documents in the data store. The data store storesdocuments comprising information largely pertaining to one context,therefore this number of occurrences provides an indication of thefrequency of use of a word in that context. The importance of a word maybe approximated based on its frequency of use.

When a word has a low number of occurrences in the documents in the datastore, it may be assumed that it does not occur frequently in relationto the overall context of those documents. In which case, the server 30determines the score for that word based on the number of occurrences ofthat word in a separate corpus of text (e.g. a second set of documents).The separate corpus of text is not associated with the same context asthe documents in the data store. The server 30 may process the queriesby selecting and/or comparing item identifiers (e.g. sentences) based ontheir composite words and the scores determined for those words.

FIG. 1 shows a network system 100 comprising: a first user device,hereinafter user equipment (‘UE’) 21, a second UE 22, a first facility41, a second facility 42 and a server 30. Each of the first UE 21, thesecond UE 22, the first facility 41 and the second facility 42 areconnected to the server 30 over a network 50. A first user may operatethe first UE 21 to communicate with the server 30, for example to sendand receive messages relating to item identifiers in the data storeand/or queries about item identifiers. A second user may operate thesecond UE 22 to also communicate with the server e.g. over network 50.

The server 30 comprises a processor 31 and a data store 32 and acommunications interface (not shown) for communication via the network.The processor 31 is coupled to the data store 32 so that it may readand/or write data to the data store 32. The data store 32 stores atleast one first document 33. The at least one first document 33comprises a plurality of item identifiers, and each item identifiercomprises at least one gram of text. The data store 32 also stores anassociation (not shown) between each of these grams of text and acorresponding gram score. The data store 32 is illustrated as alsostoring at least one reference document 34. These may be stored onvolatile and/or non-volatile memory of the data store 32.

The at least one first document comprises a plurality of firstdocuments, each of which comprises a list of item identifiers. Each itemidentifier comprises at least one gram of text, and so each firstdocument comprises a plurality of grams of text, and thus, across theplurality of first documents, a gram of text may occur in numerousdifferent item identifiers. The gram score for each gram is a valueindicative of the number of item identifiers in which that gram occurs.Thus, the gram score represents the number of occurrences of itemidentifiers including that gram across all of the plurality of firstdocuments. The at least one first document may also include a summarydocument (not illustrated). The summary document is based on the otherfirst documents. It comprises an indication of each different gram oftext which is comprised within any of the other first documents. Thesummary document comprises, for each different gram of text used in anyother first document, an association between that gram and a valueindicative of the number of occurrences of that gram in item identifiersin any of the other first documents. The summary document provides amapping between each different gram of text and a respectivecorresponding value indicative of the number of occurrences of that gramof text in the first documents. The summary document associates eachdifferent gram of text with its corresponding gram score.

A reference document comprises a list of reference item identifiers.Each reference item identifier comprises at least one gram of text, andso the reference document comprises a plurality of grams of text. Gramsmay occur a plurality of times in a reference document. The at least onereference document comprises a reference summary document, whichperforms the same function as the summary document above, except thespecific grams, and their number of occurrences are different as theyare based on a different set of documents. The facility comprises anassociated device for interaction with the server. The facility providesa means for distribution of items, including items represented by theitem identifiers. The facility may communicate with the server 30 andother UEs over the network.

In operation, the server 30 is configured to obtain a new itemidentifier. The new item identifier is received from a first facility.The new item identifier comprises at least one gram of text, for examplein the form of a string of text which provides an indication of an itemassociated with a facility, e.g. an item available at the facility. Theprocessor 31 is configured to attribute semantic importance to grams inthe item identifiers, which comprises, for each gram of text in anidentifier (e.g. the a item identifier, a query item identifier, or anitem identifier in the data store), the processor 31 is configured todetermine a gram score. Therefore, a new item identifier may have aplurality of gram scores associated therewith, one gram score for eachrespective gram in the item identifier.

The processor 31 is configured to determine the gram score for a gram oftext based on the number of occurrences of the gram in the at least onefirst document 33 stored in the data store 32. The at least one firstdocument 33 may comprise a plurality of documents, each comprising aplurality of item identifiers. In such a case, the number of occurrencesis determined to be the number of occurrences of a gram of text in allof the first documents combined. The data store 32 stores a summarydocument which stores an association between every different gram oftext in the at least one first document 33 and a number of occurrencesfor each of said grams in the at least one first document 33. The atleast one first document 33 may comprise the summary document. The atleast one first document 33 may comprise a plurality of first documentsand a summary document. Each of the plurality of first documents isassociated with a respective facility, and each provides a plurality ofitem identifiers. The summary document comprises an indication of eachgram of text occurring in the plurality of first documents associatedwith a corresponding number of occurrences for said gram.

The processor 31 parses each gram in the new item identifier using thesummary document. Parsing comprises comparing the gram in the new itemidentifier with grams in the summary document. Parsing comprisesperforming a textual analysis to determine if two grams match, i.e. theyare identical. Parsing is based on textual content of a gram and thestructure and order of the characters in the gram. In the event that theprocessor 31 determines that there is a match, e.g. the gram is alreadyin a first document, and thus in the summary document, the processor 31determines the gram score based on the stored number of occurrencescorresponding to that gram. In the event that the processor 31determines that there is not a match, the number of occurrences will bedetermined to be a small number, e.g. close to zero. Such a number isselected for smoothing purposes, as use of zero would produce an inversedocument frequency (‘IDF’) value of infinity. The processor 31 isconfigured to determine the gram score in two different ways, theselected way being determined based on the number of occurrences of thegram in the first documents. This is determined based on comparing thenumber of occurrences with an occurrence threshold. The occurrencethreshold may be a numerical value which has been selected, e.g. becauseit represents a statistically significant number.

In the event that the number of occurrences is greater than theoccurrence threshold, the processor 31 is configured to determine a gramscore for the gram based on this number of occurrences. The processor 31determines an IDF value. The gram score may be this IDF value; it may bedetermined based thereon, e.g. an indication thereof. An IDF valueprovides an indication of how frequently that gram occurs in the corpusof text. The IDF value may be obtained using the formula:

IDF=logN/n_(t)

Where N represents the total number of item identifiers in the at leastone first document stored in the data store 32 (e.g. the number of itemidentifiers in each of the plurality of first documents). This totalnumber includes each occurrence of an item identifier, so that e.g. anitem identifier which occurs 100 times will contribute a value of 100 tothe total number of item identifiers. Therefore, this number will be aconstant for each calculation (assuming the data set does not grow).Here, n_(t) represents the number of occurrences of the gram for whichthe gram score is being determined (i.e. the number of item identifiersin which that gram occurs). This gram score may provide an indication ofan inferred importance for the gram, or an indication of howdistinguishing that gram is. This is because, if a particular gramoccurs very frequently in the at least one first document 33, it may beconsidered commonplace and thus incapable of adequately distinguishingtwo different item identifiers. For example, the gram (word): “the” mayprovide a very small amount of insight or contribution to the overallmeaning for the item identifier: “corn on the cob”. The processor 31 istherefore configured to determine a low gram score for a gram which hasa high number of occurrences in the at least one first document 33.

The at least one first document 33 is associated with a first context.For example, this context may be specific to e.g. a certain field oftechnology, available goods or services in a particular field or sector,type of activity. A context provides an indication of an overridingassociation linking the different item identifiers in the at least onefirst document 33. Each of the item identifiers may be related to onecontext, which has its own associated frequencies of grams of textoccurring. For example, where the at least one first document 33 isassociated with the context of rugby, the grams: “line-out” and “scrum”may occur regularly, whereas in different contexts they may not. Thefrequency of occurrence of these words is thus dependent on the contextof the at least one first document 33.

In one example, first documents may relate to takeaway restaurants, thecontext associated therewith being “food”. Each item identifier maytherefore represent an item on a menu for a restaurant. The firstdocuments (and thus the summary document) will define a first corpus oftext, which is based on all of the item identifiers. As a consequence,this corpus of text is specific to the context of “food”, and thus whendetermining an IDF value for a gram of text, the IDF value will beindicative of the frequency of occurrence of said gram in a series offood-specific item identifiers. As an example, when determining gramscores for each of the grams in the item identifier: “tasty chickencornichon pizza”, the processor 31 may determine average scores for thegrams: “chicken” and “pizza”, as both occur fairly frequently in itemidentifiers. The processor 31 may also determine high scores for thegrams: “tasty” and “cornichon”, because neither occur very frequently inthe item identifiers.

In the event that the processor 31 compares the number of occurrences ofa gram in the at least one first document 33 to the occurrencethreshold, and the number of occurrences is below this threshold, theprocessor 31 is configured to determine the gram score in a differentmanner. In particular, the processor 31 is configured to determine thenumber of occurrences of that gram in a reference document 34. As withthe first documents, the at least one reference documents may comprise aplurality of reference documents, and the number of occurrences isdetermined based on the number of occurrences in all of the referencedocuments. Also, a summary reference document may be provided, whichprovides an association between each different gram which occurred inthe reference documents, and a value indicating the number ofoccurrences of that gram in the reference documents. In such a case, theprocessor 31 may be configured to parse the gram in question using thesummary reference document, and in the event that the gram matches agram in the summary reference document, obtain the number of occurrencescorresponding to that gram in the reference document.

This obtained reference number of occurrences may be compared with areference threshold. In the event that the reference number ofoccurrences is below the reference threshold, the processor 31 isconfigured to determine that the frequency of that gram in the referencedocument 34 is insubstantial, and to determine the gram score for thatgram based on the number of occurrences of the gram in the firstdocuments, for example, based only on this number of occurrences. In theevent that the reference number of occurrences is greater than thereference threshold, the processor 31 is configured to determine thegram score for said gram based on both the reference number ofoccurrences and the first document number of occurrences. The processor31 is configured to generate an IDF value for the gram based on thereference documents. This may be determined in the manner describedabove for the first documents.

The reference documents may be selected to be contextual, or to providean indication of a general overview of e.g. the English language. Forexample, articles from Wikipedia may be used as the reference documents.The summary reference document may therefore provide an indication ofthe number of occurrences, over the whole of Wikipedia, for eachdifferent gram of text which occurs in articles on Wikipedia. In theexample above, where the context of the first documents is food, it isclear that the context defined by Wikipedia is different to food, and inthis case, is generally acontextual.

By determining an IDF value for a gram based on the reference documents,the processor 31 is therefore operable to determine that the gram may bea commonly used gram in the English language, which is just not usedcommonly in connection with the context of the first documents. In the“tasty chicken cornichon pizza” example, the gram scores for “chicken”and “pizza” may be determined based on the first documents alone,because they occur frequently enough in the first documents. However,the processor 31 would be configured to determine reference scores forthe grams: “tasty” and “cornichon” as they do not occur frequently inthe first documents. The processor 31 may determine that “cornichon”does not occur frequently in the reference documents. In which case, theprocessor 31 is configured to determine the gram score for “cornichon”based on the first document IDF value, but not the reference documentIDF value. This enables the processor 31 to identify that a gram used inan item identifier is rare in both contexts, and thus may represent avery niche feature.

The processor 31 may determine that “tasty” occurs frequently, and thushas a significant reference IDF value. In which case, the processor 31is configured to determine the gram score based on both the firstdocument IDF value and the reference document IDF value. This may bedetermined based on a combination of the two, e.g. an average.Alternatively, the gram score may be determined so that the size of thereference score is used to determine the size of reduction to the firstdocument IDF value. For example, one may be subtracted from the other.Thus, the processor 31 is configured to identify grams of text which mayseem to be very distinctive when viewed in terms of the first context,but are actually not when viewed in terms of a more general context. Inthe case of “tasty”, this would not provide much of an indication intothe meaning or ‘essence’ of an item identifier. Because the processor 31is configured to determine the gram score based also on the referencescore, the gram score attributed to that gram may be suitably reduced invalue to indicate that the gram is commonplace in another, more general,context.

The processor 31 may also be configured so that in the event that anoriginal value for the gram score is greater than a gram threshold, thevalue for the gram score is reduced. The term ‘original value for thegram score’ is used to indicate the value for a gram score, asdetermined above, before this value is modified as set out herein. Thegram threshold may be selected based on e.g. a statistical significance.For original values above the gram threshold, the gram score isreduced—for example, the gram score may be determined to be the gramthreshold minus the difference between the original gram score and thegram threshold. For example, the gram “cornichon” may be determined tohave an IDF value (and thus original value for the gram score), which isvery high as the gram occurs so infrequently. The gram score for“cornichon” will therefore be determined to be lower than the gramthreshold, as it is sufficiently unknown that it is likely to not bedistinctive.

The gram scores may thus provide an indication of the contribution thatgram may provide to an item identifier, and so how distinctive the gramis. Based on these gram scores the processor 31 is configured to selectat least one item identifier in response to obtaining at least one queryitem identifier. The processor 31 is configured to select the at leastone item identifier based on a determined degree of similarity betweenan item identifier in the first document 33 and the query itemidentifier. The processor 31 determines this degree of similarity usingthe determined gram scores for the grams in each item identifier.

The processor 31 is configured to determine a degree of similaritybetween two item identifiers. For instance, this may be between a queryitem identifier and an item identifier in a first document 33. Theprocessor 31 may do so using a word embedding. The word embedding is alanguage model configured to map a gram of text to a location in avector space, e.g. an N-dimensional vector space. The word embedding asdescribed herein may comprise a neural network architecture. Forinstance, the word embedding may be an output from such a neural networkarchitecture, e.g. in the form of a look-up table for conversion ofgrams to vector co-ordinates. This table may have been derived as aresult of such a neural network architecture which has determined‘learned weights’ for the gram to vector conversion. Either and/or bothof the word embedding or the neural network architecture may be used forthis conversion.

A suitable neural network architecture comprises at least one layercontaining a plurality of neurons. Each neuron is configured to processinput data to provide output data. This input data may be received fromneurons in a preceding layer, and/or the output data may be provided toneurons in a subsequent layer. Each neuron is configured to perform anoperation (e.g. based on a mathematical model or logical architecture)on its input data to provide the output data. Each stream of input dataprovided to a neuron may have a weighting applied thereto, which acts toscale the different sources of input to the neuron. The network istrained so that the input data may be a string of grams of text (e.g. anitem identifier) or it may just be one gram of text. The embedding isconfigured to analyse input text and determine co-ordinates, in thevector space, based on this input text.

The processor 31 is configured to determine co-ordinates for itemidentifiers using the word embedding. Each gram in the item identifiercontributes to the co-ordinates, e.g. the overall location for the itemidentifier is a combination of the locations for its composite grams.This may be an average location or a sum. The processor 31 is configuredso that the contribution of each gram in an item identifier to theoverall location for that item identifier in the vector space is scaled.This scaling for each gram is determined based on its respective gramscore. For instance, the gram score may be a numerical coefficient.Thus, grams with higher gram scores will have a proportionately greaterinfluence on the location of an item identifier in the vector space thangrams with lower gram scores. Therefore, the processor 31 may beconfigured to obtain an item identifier, and based on a determined gramscore for each gram in the item identifier and the use of the wordembedding, a location may be determined for the item identifier in thevector space.

The processor 31 is configured to determine a degree of similaritybetween two item identifiers based on their respective locations in thevector space. These locations are determined based on the determinedgram scores. The processor 31 is configured to determine the degree ofsimilarity based on a cosine similarity between the two itemidentifiers. The cosine similarity is determined by calculating thecosine of the angle (i.e. based on the displacement of each locationfrom the origin) between the two locations in the vector space. Thisprovides an indication of the similarity of orientation for the two itemidentifiers. A value for the cosine similarity will be between negativeone and one, unless only positive space is considered, in which case thevalue would be between zero and one. The degree of similarity may be avalue based on the cosine similarity value; it may be the cosinesimilarity value. The cosine similarity between two vectors (thelocations of the item identifiers) may be determined using:

${similarity} = {{\cos (\theta)} = {\frac{A \cdot B}{{A}{B}} = \frac{\sum_{i = 1}^{n}{A_{i}B_{i}}}{\sqrt{\sum_{i = 1}^{n}A_{i}^{2}}\sqrt{\sum_{i = 1}^{n}B_{i}^{2}}}}}$

Where A and B are vectors for the two locations, and A_(i) and B_(i) arethe weighted locations for the component grams. The processor 31 istherefore operable to determine the degree of similarity between twoitem identifiers. Based on this determined degree of similarity, theprocessor 31 may be configured to identify a relationship between thetwo item identifiers. For example, it may be determined that they areexactly the same, or it may be determined that they are similar above asimilarity threshold. The similarity threshold may be selected so thattwo item identifiers may be determined to be similar in the event thattheir degree of similarity is above the similarity threshold, where thesimilarity threshold does not require the two item identifiers to beidentical. For example, a replacement may be requested for an itemidentifier, wherein the processor 31 is configured to determine asuitable item identifier to be the replacement. In this example, thesuitable item identifier may be determined as an item identifier whichsatisfies the similarity threshold.

In the “tasty chicken cornichon pizza” example, the processor 31 maydetermine that the item identifier “tasty chicken cornichon pizza” issimilar to “chicken pizza”. This is because, as described above, theprocessor 31 will determine lower gram scores for the grams “tasty” and“cornichon”. In which case, a majority of the contribution to thelocation in the vector space for this item identifier will come from thegrams “chicken” and “pizza”, and their respective weightings. As thesegrams and weightings are also found in the “chicken pizza” itemidentifier, the two item identifiers will be located near to one anotherin the vector space. It is to be appreciated that selecting differentvalues for the occurrence threshold, the gram threshold and thereference threshold may result in such determinations produce differentdegrees of similarity. The values for these parameters may be determinedas part of training the system, so that suitable degrees of similaritymay be determined for known item identifiers for which a desired degreeof similarity may be known.

In some embodiments, the processor 31 may be configured to identify thepresence of a trigger gram in an item identifier. A trigger gram mayrepresent a type of subjunctive clause or natural break in the language.For example, trigger grams be grams such as “with” or “and”, whichindicate the presence of clauses in an item identifier —these gramsbeing used to identify a transition from a first clause to a second one.The processor 31 is configured to determine the presence of such triggergrams and to determine the scores for grams in an item identifier basedon the presence of these trigger grams. For instance, trigger grams maybe detected by parsing the grams in an item identifier against a knownset of trigger grams, and determining if there are any matches.

In the event that a trigger gram is detected, the processor 31 isconfigured to adjust gram scores for that item identifier. This maycomprise reducing the gram scores for any subsequent grams after thetrigger gram. It may comprise zeroing these gram scores so that only afirst portion of the item identifier is used when comparing two itemidentifiers. This may enable the processor 31 to determine that two itemidentifiers: “chicken pizza” and “chicken pizza with side salad” aresimilar, where without the trigger gram being detected and thesubsequent grams having a lower gram score; they may not have beendetermined to be similar. This may enable the processor 31, whencomparing two item identifiers, to focus on the primary aspect (firstportion) of an item identifier.

In some embodiments, the processor 31 may obtain a query item identifiercomprising a trigger gram. In this case, the processor 31 may beconfigured to detect a trigger gram between a first portion of the queryitem identifier and a second portion of the query item identifier. Inthis case, the processor 31 may separate the query item identifier intotwo query item identifiers, one for each portion. The processor 31 maythen look for a first item identifier in the first documents which issimilar to the first portion of the query item identifier, and a seconditem identifier in the first documents which is similar to the secondportion of the query item identifier. The processor 31 may therefore beable to determine that “chicken pizza with side salad” is similar to acombination of the item identifiers: “chicken pizza” and “side salad”.However, it is to be appreciated that the processor 31 may alsodetermine a degree of similarity based on the full item identifiers aswell as and/or before determining based on the separate portions.

FIG. 2 shows a timing diagram indicating a method of operation using thenetwork system 100 of FIG. 1. In particular, FIG. 2 illustratescommunication between the first UE 21 and the server 30, the firstfacility 41 and the server 30, and processing within the server 30.

At step 200, the server 30 obtains a new item identifier comprising atleast one gram of text. This is illustrated as being sent from thefacility, for instance an indication of the new item identifier may besent in a new item identifier message. The new item identifier may beindicative of a new item associated with the first facility 41, e.g. anitem which is now available at the first facility 41. In which case, thenew item identifier is sent to the server 30 so that the server 30 mayadd it to the first document 33 associated with that facility in thedata store 32. At step 210, for each gram in the item identifier, theprocessor 31 determines the number of occurrences of that gram in thefirst documents. In the event that the number of occurrences for a gramis greater than the gram threshold, the processor 31 determines a gramscore for that gram, as set out above. In the event that the number ofoccurrences for a gram is below the gram threshold, the method proceedsto step 220.

At step 220, a gram score is determined for any grams for which thenumber of occurrences of that gram in the first documents is below theoccurrence threshold. The gram score for these grams is based on boththe number of occurrences of that gram in the first documents and in thereference documents, and is determined as set out above. At step 230,once all of the gram scores have been determined for the grams in thenew item identifier, the new item identifier is stored in the data store32. This may comprise adding the new item identifier to the firstdocument 33 associated with the first facility 41. It may also comprisestoring any new grams in the summary document, so that they may be usedfor future determinations. It may also comprise updating the storednumber of occurrences for each of the grams in the summary documentwhich were also in the new item identifier, i.e. incrementing theirrespective counts by one to account for the addition of the itemidentifier (and its component grams) to the data store 32.

At step 240, the processor 31 obtains an indication of a query itemidentifier. In FIG. 2 this is illustrated as being sent from the firstUE 21 to the server 30. For example, this indication may be sent in aquery message. This may represent an indication that a user of the firstdevice is requesting an item, and the query item identifier provides anindication of the requested item. At step 250, a query gram score isdetermined for each of the grams in the query item identifier. The querygram scores are determined as set out above, and in the same manner asfor step 210. At step 260, a query gram score is determined as in step220. At step 270, the query item identifier is compared with itemidentifiers in the first documents. This comparison is as set out above,and is based on the query gram scores and gram scores associated withthe grams in the item identifiers. For each item identifier in the firstdocuments, a degree of similarity may be determined. This determineddegree of similarity is compared with a similarity threshold, and in theevent that this degree of similarity is above the similarity threshold,the processor 31 is configured to select this item identifier and themethod proceeds to step 280.

At step 280, the processor 31 is configured to retrieve (or retrieve anindication of) the item identifier with the degree of similarity withthe query item identifier above the similarity threshold. At step 290,the processor 31 sends this indication to the first UE 21. For instance,this may be sent in a response message. The response message isconfigured to provide an output at the first UE 21. For example, thisoutput may be to display an indication of this item identifier to a userof the first UE 21.

FIG. 3 shows a timing diagram indicating a method of operation using thenetwork system 100 of FIG. 1. In particular, FIG. 3 illustratescommunication between the first UE 21 and the server 30, the firstfacility 41 and the server 30 and the second facility 42 and the server30, as well as the internal processing of the server 30. FIG. 3 relatesto a recovery system for an item identifier. In this case, the server 30may receive a request for an item from the first UE 21, in the form ofthe server 30 receiving an indication of an item identifier associatedwith the first facility 41. This request is communicated from the server30 to the first facility 41.

At step 300, the first facility 41 sends a request failure message tothe server 30. This message comprises an indication that the firstfacility 41 is unable to process the request from the first UE 21. Forinstance, the request may be for an item which the first facility 41 nolonger has. At step 310, the processor 31 is configured to obtain anindication of a query item identifier associated with the requestfailure message. For example, messages between the server 30 and thefirst facility 41 may be associated with an identifier, which may enablethe server 30 to look-up the previous messages associated with thatidentifier which may include an original query item identifier requestmessage from the first UE 21. From this message, the processor 31 mayobtain the indication of the query item identifier. The processor 31 isthen configured to determine, for each gram in the obtained query itemidentifier, the number of occurrences of that gram in the first document33. In the event that the number of occurrences is greater than anoccurrence threshold, the processor 31 is configured to determine thequery gram score based on the number of occurrences, as set out above.In the event that the number of occurrences is not greater than theoccurrence threshold, the method proceeds to step 320.

At step 320, the processor 31 determines the query gram score for anygrams not satisfying the occurrence threshold. The processor 31determines this based on the number of occurrences of the gram in thereference documents, as set out above. At step 330, once the processor31 has determined a query gram score for each gram in the query itemidentifier, the processor 31 compares the query item identifier with theitem identifiers in the first document 33. This comparison involvesdetermining a degree of similarity between item identifiers in the firstdocuments and the query item identifier. Each degree of similarity isdetermined as set out above. In the event that the processor 31determines that an item identifier in one of the first documents has adegree of similarity with the query item identifier above a similaritythreshold, the method proceeds to step 340.

At step 340, the processor 31 obtains said item identifier, or anindication thereof, from the first document 33 in the data store 32. Atstep 350, the server 30 sends an indication of said item identifier in areplacement message to the first UE 21. This message may comprise anindication that the original item identifier associated with the firstfacility 41 is unavailable, and that a similar item identifier has beenfound which is associated with a second facility 42. At step 360, thefirst UE 21 sends, in response to receiving the replacement message, anacceptance message. The acceptance message comprises an indication ofwhether or not the first UE 21 accepts the item identifier in thereplacement message. At step 370, in the event that the acceptancemessage comprises an indication that the first UE 21 accepts the itemidentifier in the replacement message, the server 30 is configured tosend, to the second facility 42 an item request message. This itemrequest message comprises an indication of: (i) a request for the itemassociated with the item identifier, and (ii) a user associated with thefirst UE 21.

In some examples, this method may be considered a method of orderrecovery, in which an order for an item at one facility which does notsucceed is ‘recovered’. This is by determining a suitable item atanother facility which could be used to ‘recover’ the order (e.g. byplacing the same order at the other facility). In some embodiments, therequest failure message may be associated with a plurality of itemidentifiers. This plurality of item identifiers may be associated with afirst facility 41 and thus from a corresponding first document 33 in thedata store 32. Each of the plurality of first documents may beassociated with a corresponding facility and may comprise a plurality ofitem identifiers. The item identifiers for a first document 33 mayrepresent items available at the corresponding facility.

In such embodiments, the processor 31 is configured to determine querygram scores for each gram in each of the query item identifiers. Thesequery gram scores are used to determine the degree of similarity, asabove. However, when selecting item identifiers from the first documentsto replace the original item identifiers, the processor 31 may selectthem based on an added constraint that all of the replacement query itemidentifiers must be obtained from the same first document 33. Forexample, this may relate to recovering an order for some products fromone facility by finding a suitable other facility which may offer thesame, or suitably similar, products, and re-routing the order to thatfacility. This may be considered an order recovery problem with alimited solution set.

The processor 31 is configured to determine a degree of similaritybetween each of the query item identifiers and item identifiers in thefirst documents. In the event that the processor 31 determines thatthere is a first document 33 in the data store 32 which comprises, foreach query item identifier, at least one item identifier which has adetermined degree of similarity above the similarity threshold, theprocessor 31 may recover the order using that first document 33. Thismay comprise sending an indication of the first document 33 and anindication of the relevant item identifiers to the first UE 21. In theevent that there are a plurality of such first documents, the processor31 may select the first document 33 comprising item identifiers havingthe highest overall degree of similarity to the query item identifiers.Alternatively, the processor 31 may be configured to select all suchfirst documents and send an indication of all of the first documents tothe first UE 21.

In the event that the processor 31 does not determine that there is afirst document 33 in the data store 32 which comprises, for each queryitem identifier, at least one item identifier which has a determineddegree of similarity above the similarity threshold, the processor 31 isconfigured to select a ‘next-best’ document. This may comprise theprocessor 31 selecting the first document 33 having item identifierswhich have the highest overall degree of similarity with the query itemidentifiers. Overall degree of similarity comprises a combination (e.g.average) of each degree of similarity between the relevant itemidentifiers in the first document 33 and the query item identifiers.Alternatively, the processor 31 may be configured to select the firstdocument 33 which comprises suitable item identifiers so that thegreatest possible proportion of the query item identifiers may bematched with an item identifier having a degree of similarity above thesimilarity threshold.

The server 30 may thus enable an order to be recovered in morecircumstances, because an item identifier and a query item identifiermay be determined to be similar based on the scoring system describedabove.

A method of determining a gram score will now be described withreference to FIG. 4.

At step 400, the method starts and proceeds to step 410 at which a gramis obtained. At step 420, the number of occurrences of that gram in theat least one first document is determined. This is determined asdescribed above, wherein the gram is parsed using the summary documentto determine if there is a match for that gram. In the event that thereis a match, the number of occurrences corresponding to the matched gramis taken to be the number of occurrences of the gram in the at least onefirst document. At step 430, the number of occurrences is compared tothe occurrence threshold. In the event that the number of occurrences isbelow the occurrence threshold, the method proceeds to step 440, atwhich the number of occurrences of the gram in the at least onereference document is determined. As above, this is determined using thereference summary document to determine if there is a match for thatgram. In the event that there is a match, the method comprisesidentifying the number of occurrences corresponding to the gram. Themethod then proceeds to step 450.

At step 450, the number of occurrences of the gram in the referencedocument is compared to a reference threshold. In the event that thenumber of occurrences of the gram in the at least one reference documentis less than the reference threshold, or, at step 430, it is determinedthat the number of occurrences of the gram in the at least one firstdocument is greater than the occurrence threshold, the method proceedsto step 460. At step 460, the gram score is determined based on thenumber of occurrences of the gram in the at least one first document.The gram score is determined based on an inverse document frequencycalculation as described above.

In the event that, at step 450, the number of occurrences is greaterthan the reference threshold, the method proceeds to step 470. At step470, the gram score is determined based on the number of occurrences ofthe gram in the at least one first document and the number ofoccurrences of the gram in the at least one reference document. The gramscore is determined as above in that a score is first determined as itwould be at step 460 (i.e. based on the number of occurrences in thefirst document), this score is then reduced based on a score determinedfor the number of occurrences of the gram in the at least one referencedocument. The method then proceeds to step 480. At step 480, thedetermined gram score is compared to a gram threshold. In the event thatthe gram score is greater than the gram threshold, the method proceedsto step 490, wherein the gram score is reduced so that it is below thegram threshold as described above. In the event that, at step 480, thegram score is less than the gram threshold, the gram score remains thesame as the determined score (e.g. at either one of steps 460 or 470).The method then finishes at step 510.

In some embodiments, the server 30 may be implemented to support thefunctionality of ‘chat bots’. These may comprise any method of enablinga user to provide an indication of an item they wish to order, butwithout providing an indication of the exact item identifier andfacility. For example, this may be in the form of an instant messagingservice or voice activated orders. The processor 31 is configured toobtain the indication of the item identifiers from this communication,and to determine gram scores for the indication. Then, as above, asuitable facility and item identifiers may be selected based on theirdetermined degree of similarity to the obtained indications.

In some embodiments, the determined degrees of similarity may be used todetermine that item identifiers are directed to the same item eventhough the item identifiers differ slightly. This may be implementedthrough the provision of a second similarity threshold selected toindicate a sufficiently high degree of similarity to indicate that twoitems are sufficiently identical. This may enable the provision ofimproved analytics. For example, it may enable the processor 31 toperform comparisons (e.g. calculate statistics) between items ratherthan item identifiers because the processor 31 may determine that allitem identifiers within a selected degree of similarity to one anotherrepresent the same item and thus may all be used for the same analyticpurposes. For example, “chicken tikka”, “chicken tikka curry” and “tastychicken tikka” may all be determined to represent the same item and thusanalytics relating to “chicken tikka” may be performed based on all ofthese item identifiers.

It is to be appreciated in the context of this disclosure that there maybe a plurality of similarity thresholds. For instance, a degree ofsimilarity above each threshold may indicate a different classificationshould be used for the degree of similarity between two itemidentifiers. For instance, one threshold may indicate an identical ordirect match; one may indicate a non-identical but very similar match;one may indicate a similar match. It is to be appreciated in the contextof this disclosure that a value for any of the thresholds describedherein may be determined in a number of ways. They may be selected basedon empirical data, where the thresholds are set so that thedeterminations may produce a desired outcome. The thresholds may alsoapply to any suitable metric. For example, the occurrence threshold maybe a threshold applied to the IDF values rather than to the number ofoccurrences. Likewise, the same may be applied for reference IDF values.For example, the gram threshold may be selected to be a certain value sothat certain item identifiers are focused on more than others. This mayinvolve tailoring the threshold to the data set.

It is to be appreciated in the context of this disclosure that anysuitable method of using the gram scores and item identifiers may beused to determine the degree of similarity. For instance, termfrequency-inverse document frequency values may be used instead of theinverse document frequency values. In which case, the scores and/orvalues thereof may be adjusted accordingly. Likewise, a differentdetermination of similarity may be used to the cosine similarity. Forexample, a cosine distance may be used. It is to be appreciated that inthe event that the number of occurrences of a gram in the referencedocument 34 is below the reference threshold, a value for the referenceIDF may still be used to determine the gram score. It is to beappreciated that the data store 32 may store, for each gram (e.g. in thesummary document), an indication of a previously determined gram scorefor that document and a previously determined location for that documentin the vector space. This may increase the speed of comparisons whencompared to determining a degree of similarity based on determining bothlocations on-the-fly. The processor 31 may be configured to periodicallyre-determine these values and update their indications in the data store32 accordingly.

It is to be appreciated that item identifiers need not be selectedimmediately in response to determining that they are similar to a queryitem identifier. For example, at step 270 there may be some other methodof selecting the item identifier, e.g. still comparing the query itemidentifier with each item identifier in the first documents andselecting the item identifier with the highest determined degree ofsimilarity, or a selection of the item identifiers with the highestdegrees of similarity.

Aspects of the present disclosure may address technical problemsrelating to natural language processing. For example, in many technicalscenarios, human language input (e.g. a string of text) may be providedwhich needs to be interpreted by a machine, such as a robot, e.g. acomputer. In such input, not all words will carry the same amount ofsemantic weight when viewed in the context of the string as a whole. Forhumans, experience with the use of language enables the words in thesentence to be attributed a reasonable amount of weight (as appropriatein context). However, programming such experience into a computer isless straightforward. The methods and systems for natural languageprocessing disclosed herein may provide a solution to such technicalproblems.

For example, embodiments of the disclosure may be used to provide visualinterpretation of audio, e.g. speech, data for the hearing impaired toprovide improved means for communication. As such, input (e.g. asentence) may be represented pictographically on a display e.g. usingnon-alphanumeric indicators to indicate its semantic meaning without theneed for verbal transcription, to help enable a person quickly tounderstand the meaning of what is being communicated to them. It is tobe appreciated that this may extend to a study of foreign languages,where a sentence is explained to an individual by showing such anindicator of the meaning of words within that sentence. However, wordssuch as “the” cannot easily be graphically represented, and even if theycould be, they are unlikely to provide much of a visual prompt to auser. It is therefore desirable for such a visual prompt based system tobe able to identify key words in an input sentence.

As an example, for the phrase “the elephant in the room”, some of thesewords may provide little insight when pictographically represented to auser. However, by presenting the user with a picture of an elephant, andperhaps a picture of a room, the user would easily be able to identifywhat was being communicated. Embodiments of the present disclosure mayprovide a computer-implemented method capable of identifying the moreimportant/significant words in a sentence. As such, it may provide atechnical solution to improving systems for displaying the content ofinput text to a user.

Such systems and methods for natural language processing may extend torobotics, where a robot may be required to receive input text such as acommand, and the robot is required to interpret said command and inresponse to perform an action. This approach may also be used whendeveloping chat bots or other such technical means. When presented withan ambiguous input string, such as a sentence with words missing orwhere instructions were not clearly understood, such technical means mayneed to make a decision on what action the input command requires. Byidentifying key components of the command, such robotic systems may beable to infer, or make a more educated guess as to the contents of theinput command. For example, for a voice-activated command system such asAmazon Alexa® may receive an input string of text containing manydifferent words, two of which may be “Spotify®” and “Hendrix”. Fromthis, it may determine that these grams of text have substantiallyhigher gram scores than other grams in the string, and as such it candetermine that these may be more decisive components of the sentence,and may deduce that ‘Spotify’ should be opened and ‘Hendrix’ played,without actually receiving that command in full.

The methods and systems described herein may address a technical problemof controlling communication and transport of commodities in a network.By identifying more important component grams in a search query, acentral server may be able to respond to a query comprising a string ofcomponent grams by providing a series of search results. The searchresults may be determined based on a use of the present method todetermine which results are considered most relevant. Then, in responseto the search query, the central server may respond by sending acommunication (e.g. a network message) to user equipment associated withthe search query. The contents of this message will thus be directlyaffected as a result of implementation of the present method.Consequently, this communication between separate devices (e.g. onebelonging to a host server and one to a subscribing user—such as amobile telephone or tablet) may be affected.

Furthermore, where this technical problem includes transportingcommodities, the contents of the network message sent to the userequipment may also affect the transport of commodities from a firstphysical location to a second physical location. For example, eachcommodity included in the search results may be associated with its ownphysical location, and based on the contents of the search query, aparticular commodity (associated with a particular location) may beselected resulting in transport of said commodity from its particularinitial location to a specified location. It is to be appreciated thatsuch a technical process of transportation is directly affected as aconsequence of the contents of the network message sent to the userequipment. In embodiments, the commodity may be any deliverable goodsuch as manufactured articles, raw materials, perishable goods etc.

The present system and method may enable the provision of moreappropriate search results in response to a notional search query.Consequently, in scenarios where a user who receives the search resultsthen has the option of deciding whether or not to select retrieval ofthe goods (e.g. retrieval of data), the provision of improved searchresults may increase the likelihood of goods being retrieved. As such,this may have an effect on e.g. a delivery system in terms of increasingrequirements for relocation of goods—e.g. physical relocation such asdelivery of articles from a first physical location to a designatedsecond location.

Examples have been described above in relation to perishable goods, i.e.such as takeaway food, where the server 30 is used to control ordersplaced at a series of takeaway restaurants. However, the scope of theclaims is considered to extend beyond such examples, and could be usedin many more scenarios. For example, the server 30 may be an onlinelibrary resource, and the server 30 may be configured to determine andidentify data associated with item identifiers which are determined tobe similar to the query item identifiers. Accordingly, a user mayprovide an indication of a text they wish to retrieve, and in responsethe processor 31 may determine if any of the data items in the resourceare suitably similar to the received indication, and in the event thatthey are, the server 30 may send that item, or an indication thereof, tothe user.

The user equipment illustrated in FIG. 1 has been described as a mobiletelecommunications handset, but it will be appreciated in the context ofthe present disclosure that this encompasses any user equipment (UE) forcommunicating over a wide area network 50 and having the necessary dataprocessing capability. It can be a hand-held telephone, a laptopcomputer equipped with a mobile broadband adapter, a tablet computer, aBluetooth gateway, a specifically designed electronic communicationsapparatus, or any other device. It will be appreciated that such devicesmay be configured to determine their own location, for example usingglobal positioning systems GPS devices and/or based on other methodssuch as using information from WLAN signals and telecommunicationssignals. The user device may comprise a computing device, such as apersonal computer, or a handheld device such as a mobile (cellular)telephone or tablet. Wearable technology devices may also be used.Accordingly, the communication interface of the devices described hereinmay comprise any wired or wireless communication interface such as WI-FI(RTM), Ethernet, or direct broadband internet connection, and/or a GSM,HSDPA, 3GPP, 4G or EDGE communication interface.

Messages described herein may comprise a data payload and an identifier(such as a uniform resource indicator, URI) that identifies the resourceupon which to apply the request. This may enable the message to beforwarded across the network 50 to the device to which it is addressed.Some messages include a method token which indicates a method to beperformed on the resource identified by the request. For example thesemethods may include the hypertext transfer protocol, HTTP, methods “GET”or “HEAD”. The requests for content may be provided in the form ofhypertext transfer protocol, HTTP, requests, for example such as thosespecified in the Network Working Group Request for Comments: RFC 2616.As will be appreciated in the context of the present disclosure, whilstthe HTTP protocol and its methods have been used to explain somefeatures of the disclosure other internet protocols, and modificationsof the standard HTTP protocol may also be used.

As described herein, network messages may include, for example, HTTPmessages, HTTPS messages, Internet Message Access Protocol messages,Transmission Control Protocol messages, Internet Protocol messages,TCP/IP messages, File Transfer Protocol messages or any other suitablemessage type may be used. The processor 31 of the server 30 (and any ofthe activities and apparatus outlined herein) may be implemented withfixed logic such as assemblies of logic gates or programmable logic suchas software and/or computer program instructions executed by a processor31. Other kinds of programmable logic include programmable processors,programmable digital logic (e.g., a field programmable gate array(FPGA), an erasable programmable read only memory (EPROM), anelectrically erasable programmable read only memory (EEPROM)), anapplication specific integrated circuit, ASIC, or any other kind ofdigital logic, software, code, electronic instructions, flash memory,optical disks, CD-ROMs, DVD ROMs, magnetic or optical cards, other typesof machine-readable mediums suitable for storing electronicinstructions, or any suitable combination thereof. Such data storagemedia may also provide the data store 32 of the server 30 (and any ofthe apparatus outlined herein).

It will be appreciated from the discussion above that the embodimentsshown in the Figures are merely exemplary, and include features whichmay be generalised, removed or replaced as described herein and as setout in the claims. With reference to the drawings in general, it will beappreciated that schematic functional block diagrams are used toindicate functionality of systems and apparatus described herein. Forexample the functionality provided by the data store 32 may in whole orin part be provided by a processor 31 having one more data values storedon-chip. In addition the processing functionality may also be providedby devices which are supported by an electronic device. It will beappreciated however that the functionality need not be divided in thisway, and should not be taken to imply any particular structure ofhardware other than that described and claimed below. The function ofone or more of the elements shown in the drawings may be furthersubdivided, and/or distributed throughout apparatus of the disclosure.In some embodiments the function of one or more elements shown in thedrawings may be integrated into a single functional unit.

The above embodiments are to be understood as illustrative examples.Further embodiments are envisaged. It is to be understood that anyfeature described in relation to any one embodiment may be used alone,or in combination with other features described, and may also be used incombination with one or more features of any other of the embodiments,or any combination of any other of the embodiments. Furthermore,equivalents and modifications not described above may also be employedwithout departing from the scope of the invention, which is defined inthe accompanying claims.

In some examples, one or more memory elements can store data and/orprogram instructions used to implement the operations described herein.Embodiments of the disclosure provide tangible, non-transitory storagemedia comprising program instructions operable to program a processor 31to perform any one or more of the methods described and/or claimedherein and/or to provide data processing apparatus as described and/orclaimed herein.

Certain features of the methods described herein may be implemented inhardware, and one or more functions of the apparatus may be implementedin method steps. It will also be appreciated in the context of thepresent disclosure that the methods described herein need not beperformed in the order in which they are described, nor necessarily inthe order in which they are depicted in the drawings.

Accordingly, aspects of the disclosure which are described withreference to products or apparatus are also intended to be implementedas methods and vice versa. The methods described herein may beimplemented in computer programs, or in hardware or in any combinationthereof. Computer programs include software, middleware, firmware, andany combination thereof. Such programs may be provided as signals ornetwork messages and may be recorded on computer readable media such astangible computer readable media which may store the computer programsin not-transitory form. Hardware includes computers, handheld devices,programmable processors, general purpose processors, applicationspecific integrated circuits, ASICs, field programmable gate arrays,FPGAs, and arrays of logic gates.

Other examples and variations of the disclosure will be apparent to theskilled addressee in the context of the present disclosure.

1. A server comprising: a data store storing: at least one firstdocument, wherein the at least one first document comprises a pluralityof item identifiers comprising at least one gram of text; and anassociation between each gram and its corresponding gram score; and aprocessor coupled to the data store; wherein the server is configuredfor natural language processing of text in a query, wherein the query isassociated with a user device; wherein the server is configured torespond to the query by sending, to the user device, an indication of anitem selected based on natural language processing of grams of text inthe query; wherein the natural language processing of the text in thequery comprises attributing semantic importance to grams of text in thequery, wherein attributing semantic importance to the grams of text inthe query comprises: in the event that a number of occurrences of thegram in the at least one first document is above an occurrencethreshold, determining a gram score for said gram based on said numberof occurrences; in the event that the number of occurrences of the gramin the at least one first document is below the occurrence threshold,determining the gram score based on: (i) said number of occurrences, and(ii) a reference score for the gram based on a number of occurrences ofthe gram in at least one reference document different to the at leastone first document; and attributing the semantic importance based on thegram score.
 2. The server of claim 1 wherein sending, to the userdevice, the indication of the item comprises providing an output to aresource.
 3. The server of claim 1 wherein the server is configured to:(i) obtain a new item identifier, and (ii) determine a gram score foreach gram of text in the new item identifier.
 4. The server of claim 3,where in the processor is configured to update the association basedgrams in the new item identifier and their corresponding determined gramscores.
 5. The server of claim 1, wherein the indication of the item isselected based on semantic importance attributed to grams of text in theplurality of item identifiers in the at least one first document.
 6. Theserver of claim 5, wherein the query comprises a query item identifier;and wherein selecting the indication of the item comprises selecting anitem identifier, or an indication thereof, based on, for each itemidentifier from a plurality of item identifiers in the at least onefirst document, a degree of similarity between the query item identifierand said item identifier.
 7. The server of claim 1, wherein the datastore comprises a plurality of first documents comprising a plurality ofitem identifiers; wherein the query comprises a plurality of query itemidentifiers comprising at least one gram of text; wherein the processoris configured to: determine a gram score for each gram in each queryitem identifier; determine a degree of similarity between each queryitem identifier and each of a plurality of item identifiers in at leastone of the first documents; select at least one first document based onthe determined degree of similarity between each query item identifierand respective item identifiers in each of the at least one firstdocuments; and send, to the user device an indication of at least oneof: (i) the at least one selected first document and (ii) the selectedplurality of respective item identifiers in said selected firstdocument.
 8. The server of claim 7 wherein each of a plurality of theplurality of first documents is associated with a correspondingfacility; wherein the server is configured to receive an acceptancemessage from the user device in response to the at least one itemidentifier sent to the user device; and wherein the acceptance messagecomprises an indication of whether or not the selected plurality ofrespective item identifiers for one selected first document areapproved.
 9. The server of claim 8, wherein, in the event that theacceptance message comprises an approval of the selected plurality ofrespective item identifiers from the selected first document, the serveris configured to send an indication of said plurality of respective itemidentifiers to the facility corresponding to said first document. 10.The server of claim 1 wherein in the event that the reference score fora gram of text is greater than a reference threshold value, theprocessor is configured to determine a lower gram score for said gram.11. The server of claim 10, wherein the lower gram score is determinedbased on the reference score.
 12. The server of claim 1 whereindetermining the gram score for a gram of text comprises reducing thevalue of the gram score in the event that an original value for the gramscore would be larger than a gram threshold.
 13. The server of claim 12,wherein the processor is configured to determine the size of thereduction to the gram score based on a difference between the originalvalue for the gram score and the gram threshold.
 14. The server of claim1 wherein the data store comprises a plurality of trigger grams, whereinthe processor is configured to detect a trigger gram in an itemidentifier and to alter the gram score of any subsequent grams in theitem identifier.
 15. The server of claim 14, wherein altering the gramscore comprises at least one of: reducing the gram score and separatingthe item identifier into two portions, a first portion for the gramsbefore the trigger gram and a second portion for the grams after thetrigger gram.
 16. The server of claim 1 wherein the number ofoccurrences for a gram in the at least one first document comprises atotal number of item identifiers in the at least one first documentwhich include said gram.
 17. A computer-implemented method of respondingto a query associated with a user device, the method comprising:obtaining a query associated with a user device; performing naturallanguage processing of text in the query by attributing semanticimportance to a gram of text in the query, wherein attributing semanticimportance comprises: in the event that a number of occurrences of thegram in at least one first document is above an occurrence threshold,determining a gram score for said gram based on said number ofoccurrences; in the event that the number of occurrences of the gram inthe at least one first document is below the occurrence threshold,determining the gram score based on: (i) said number of occurrences, and(ii) a reference score for the gram based on a number of occurrences ofthe gram in at least one reference document different to the at leastone first document; and attributing the semantic importance based on thegram score; selecting an indication of an item based on the semanticimportance attributed to grams of text in the query; and sending, to theuser device associated with the query, an indication of the selecteditem.
 18. A computer-implemented method for responding to a query, themethod comprising: obtaining a query comprising an item identifierrelating to a first corpus of text; performing natural languageprocessing of text in the query by attributing semantic importance tograms of text in the query, wherein attributing semantic importancecomprises: in the event that a term frequency for the gram in the firstcorpus of text is above a threshold value, determining a gram score forsaid gram based on said term frequency; in the event that said termfrequency is below the threshold value, determining the gram score basedon a term frequency for the gram in a second unrelated corpus of text;and attributing the semantic importance based on the gram score;responding to the query by providing output to a resource, wherein theoutput is selected based on the semantic importance attributed to gramsin the query.
 19. A computer program product comprising programinstructions configured to program a processor to perform the method ofclaim
 17. 20. A computer program product comprising program instructionsconfigured to program a processor to perform the method of claim 18.